Accelerate Learning Processes by Avoiding Inappropriate Rules in Transfer Learning for Actor - Critic

نویسندگان

  • Toshiaki TAKANO
  • Haruhiko TAKASE
  • Hiroharu KAWANAKA
  • Shinji TSURUOKA
چکیده

hh tt tt Abstruct—This paper aims to accelerate processes of actor-critic method, which is one of major reinforcement learning algorithms, by a transfer learning. In general, reinforcement learning is used to solve optimization problems. Learning agents acquire a policy to accomplish the target task autonomously. To solve the problems, agents require long learning processes for trial and error. Transfer learning is one of effective methods to accelerate learning processes of machine learning algorithms. It accelerates learning processes by using prior knowledge from a policy for a source task. We propose an effective transfer learning algorithm for actor-critic method. Two basic issues for the transfer learning are method to select an effective source policy and method to reuse without negative transfer. In this paper, we mainly discuss the latter. We proposed the reuse method which based on the selection method that uses the forbidden rule set. Forbidden rule set is the set of rules that cause immediate failure of tasks. It is used to foresee similarity between a source policy and the target policy. Agents should not transfer the inappropriate rules in the selected policy. In actor-critic, a policy is constructed by two parameter sets: action preferences and state values. To avoid inappropriate rules, agents reuse only reliable action preferences and state values that imply preferred actions. We perform simple experiments to show the effectiveness of the proposed method. In conclusion, the proposed method accelerates learning processes for the target tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transfer Learning Based on Forbidden Rule Set in Actor-critic Method

In this paper, we aim to accelerate learning processes in actor-critic method. We proposed the effective transfer learning method, which reduces training cycles by using information acquired from source tasks. The proposed method consists of two ideas, the method to select a policy to transfer, and the transfer method considering the characteristic of each actor-critic parameter set. The select...

متن کامل

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

This paper presents a new method — adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in taskcompletion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the ...

متن کامل

Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For sup...

متن کامل

OnActor-Critic Algorithms

In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a su...

متن کامل

Adaptive PID Controller Based on Reinforcement Learning for Wind Turbine Control

A self tuning PID control strategy using reinforcement learning is proposed in this paper to deal with the control of wind energy conversion systems (WECS). Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to impro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016